Skip to content

Migrate antithesis to K8s#21541

Open
serathius wants to merge 1 commit intoetcd-io:mainfrom
serathius:antithesis-k8s
Open

Migrate antithesis to K8s#21541
serathius wants to merge 1 commit intoetcd-io:mainfrom
serathius:antithesis-k8s

Conversation

@serathius
Copy link
Copy Markdown
Member

Ref #20572

@k8s-ci-robot k8s-ci-robot added approved area/testing github_actions Pull requests that update GitHub Actions code size/XL labels Mar 28, 2026
@serathius serathius force-pushed the antithesis-k8s branch 6 times, most recently from 257cc34 to e37b876 Compare March 28, 2026 21:51
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.40%. Comparing base (5aebee8) to head (dec1e4e).
⚠️ Report is 41 commits behind head on main.

Additional details and impacted files

see 27 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #21541      +/-   ##
==========================================
- Coverage   68.45%   68.40%   -0.05%     
==========================================
  Files         428      429       +1     
  Lines       35383    35394      +11     
==========================================
- Hits        24221    24212       -9     
- Misses       9761     9772      +11     
- Partials     1401     1410       +9     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5aebee8...dec1e4e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nwnt, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius
Copy link
Copy Markdown
Member Author

We have a passing run https://linuxfoundation.antithesis.com/report/gnMFNSkbTUOkpLxqq7y_vP-l/oQYD0X8p0pl3UkSCmRM-6b0L8McEnD0St5u351n-iKw.html?auth=v2.public.eyJzY29wZSI6eyJSZXBvcnRTY29wZVYxIjp7ImFzc2V0Ijoib1FZRDBYOHAwcGwzVWtTQ21STS02YjBMOE1jRW5EMFN0NXUzNTFuLWlLdy5odG1sIiwicmVwb3J0X2lkIjoiZ25NRk5Ta2JUVU9rcEx4cXE3eV92UC1sIn19LCJuYmYiOiIyMDI2LTAzLTI5VDIwOjA2OjUyLjM1NDAyMjE4NFoifbIaxbqyCV__bSDACYGii7iPrj-1nSJKH9zfFm3vd0YSkNYdHaRKQQYz_iRrGumWp3Kv_mfL6hm0r_MH8J8BjwY

It confirmed that antithesis.duration still accepts hours and not minutes.

To confirm with previous docker-compose runs, it didn't report following assertion errors:

  • container: etcd1, exit code: 1 - that's good, we don't care about this one
  • consistent_index isn't equal to snapshot index - could imply different failure injection and worse exploration?
  • Linearization timeout - could imply less aggressive failure injection, but might show up if we increase test duration.

cc @marcus-hodgson-antithesis could you help review this migration, and also ask someone else from Antithesis that specializes in K8s to take a look too? Would be good to confirm that we will not regress our ability to reproduce issues.

@marcus-hodgson-antithesis
Copy link
Copy Markdown
Contributor

@serathius the basic-k8s-test kubernetes endpoint only runs with network faults. The container faults / other faults that we currently have enabled on the docker setup, haven't been enabled yet on this kubernetes one.

I'm in the process of creating a custom endpoint for etcd where we can enable move over what we have from the current setup to this new one

@serathius
Copy link
Copy Markdown
Member Author

The container faults / other faults that we currently have enabled on the docker setup, haven't been enabled yet on this kubernetes one.

Can we enable them and just migrate to K8s?

@marcus-hodgson-antithesis
Copy link
Copy Markdown
Contributor

Yep, I've created a custom endpoint with the faults we had for docker @serathius

I've added a comment on the line of code we'd need to change to use it!

@serathius
Copy link
Copy Markdown
Member Author

I've added a comment on the line of code we'd need to change to use it!

Don't see comment, might need to publish?

uses: antithesishq/antithesis-trigger-action@f6221e2ba819fe0ac3e36bd67a281fa439a03fba # v0.10
with:
notebook_name: etcd
notebook_name: basic_k8s_test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do "etcd_k8s" here. This is the custom endpoint I've created for kubernetes that includes the faults we had for the docker setup

@marcus-hodgson-antithesis
Copy link
Copy Markdown
Contributor

Yep you we're right. Forgot to publish the comment -_-

Signed-off-by: Marek Siarkowicz <siarkowicz@google.com>
@serathius
Copy link
Copy Markdown
Member Author

serathius commented Apr 8, 2026

Forgot to publish the comment -_-

Yea, not the first time for me when GitHub was unnessesery casing friction.

@serathius
Copy link
Copy Markdown
Member Author

@serathius
Copy link
Copy Markdown
Member Author

/retest

@k8s-ci-robot
Copy link
Copy Markdown

@serathius: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-govulncheck dec1e4e link true /test pull-etcd-govulncheck

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@serathius
Copy link
Copy Markdown
Member Author

serathius commented Apr 8, 2026

Same configuration that worked for basic_k8s_test (basic_k8s_test run) doesn't work with etcd_k8s (etcd_k8s run)

Getting error:

7:01:57AM: ---- waiting on 2 changes [1/3 done] ----
7:02:00AM: ongoing: reconcile deployment/etcd-client (apps/v1) namespace: default
7:02:00AM:  ^ Waiting for 1 unavailable replicas
7:02:00AM:  L ok: waiting on replicaset/etcd-client-84b8d9b746 (apps/v1) namespace: default
7:02:00AM:  L ongoing: waiting on pod/etcd-client-84b8d9b746-7t5f8 (v1) namespace: default
7:02:00AM:     ^ Pending: ErrImageNeverPull, message: Container image "etcd-client:antithesis_dec1e4e969995813f82ffe8da80358c636b4992b" is not present with pull policy of Never
7:02:00AM: ongoing: reconcile statefulset/etcd (apps/v1) namespace: default
7:02:00AM:  ^ Waiting for 3 replicas to be ready
7:02:00AM:  L ongoing: waiting on pod/etcd-2 (v1) namespace: default
7:02:00AM:     ^ Pending: ErrImageNeverPull, message: Container image "etcd-server:antithesis_dec1e4e969995813f82ffe8da80358c636b4992b" is not present with pull policy of Never
7:02:00AM:  L ongoing: waiting on pod/etcd-1 (v1) namespace: default
7:02:00AM:     ^ Pending: ErrImageNeverPull, message: Container image "etcd-server:antithesis_dec1e4e969995813f82ffe8da80358c636b4992b" is not present with pull policy of Never
7:02:00AM:  L ongoing: waiting on pod/etcd-0 (v1) namespace: default
7:02:00AM:     ^ Pending: ErrImageNeverPull, message: Container image "etcd-server:antithesis_dec1e4e969995813f82ffe8da80358c636b4992b" is not present with pull policy of Never

Also possible that something broke between version 48 and 50.:

  • Successful run 52aa7f48caa06b031ccb1af3a8e21713-48-5
  • Failed run 8b346466af5340f038a00a847fafbce0-50-7

@serathius
Copy link
Copy Markdown
Member Author

Talked with @marcus-hodgson-antithesis, I was wrong about previous run using k8s. Looks like only new run used K8s and everything looks correct even thou it didn't work. Action item is on Antithesis side to debug what went wrong.

@marcus-hodgson-antithesis
Copy link
Copy Markdown
Contributor

Okay I've figured it out!

It looks like kapp is looking for an absolute image path. So, in Antithesis, we have the images:

us-central1-docker.pkg.dev/molten-verve-216720/linuxfoundation-repository/etcd-server:antithesis_dec1e4e969995813f82ffe8da80358c636b4992b

and

us-central1-docker.pkg.dev/molten-verve-216720/linuxfoundation-repository/etcd-client:antithesis_dec1e4e969995813f82ffe8da80358c636b4992b.

This makes sense because we are passing through those images in the antithesis.images parameter when kicking off a test (here).

@nwnt or @serathius is there anyway the manifests can use this absolute path when getting run in Antithesis (here)?

@nwnt
Copy link
Copy Markdown
Member

nwnt commented Apr 10, 2026

Yeah I think so. We can perhaps use something like kustomize to patch the manifest for running in the antithesis environment. What do you think @serathius ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved area/testing github_actions Pull requests that update GitHub Actions code size/L

Development

Successfully merging this pull request may close these issues.

4 participants